The OmpA of commensal Escherichia coli of CRC patients affects apoptosis of the HCT116 colon cancer cell line

Background Colorectal cancer ranks third globally among all types of cancers. Dysbiosis of the gut microbiota of people with CRC is one of the effective agents in the tumorigenesis and metastasis in this type of cancer. The population of Escherichia coli strains, a component of gut microbiota, is increased in the gut of people with CRC compared with healthy people. So, E.coli strains isolated from these patients may have a role in tumorigenesis. Because the most isolated strains belong to the B2 phylogenuetic group, there seems to be a linkage between the bacterium components and malignancy. Material and methods In this study, the proteomic comparison between isolated Ecoli from CRC patients and healthy people was assayed. The isolated spot was studied by Two-dimensional gel electrophoresis (2DE) and Liquid chromatography-mass spectrometry (LC–MS). The results showed that the expression of Outer membrane protein A (OmpA) protein increased in the commensal E.coli B2 phylogenetic group isolated from CRC patients. Additionally, we analyzed the effect of the OmpA protein on the expression of the four genes related to apoptosis in the HCT116 colon cancer cell line. Results This study identified that OmpA protein was overexpressed in the commensal E.coli B2 phylogenetic group isolated from CRC patients compared to the E.coli from the control group. This protein significantly decreased the expression of Bax and Bak, pro-apoptotic genes, as well as the expression of P53 in the HCT116 Cell Line, P < 0.0001. LC–MS and protein bioinformatics results confirmed that this protein is outer membrane protein A, which can bind to nucleic acid and some of the organelle proteins on the eukaryotic cell surface. Conclusions According to our invitro and insilico investigations, OmpA of gut E.coli strains that belong to the B2 phylogenetic group can affect the eukaryotic cell cycle.


Background
Colorectal cancer, or CRC, is the third most common cancer globally, affecting 1.36 million people each year [1][2][3]. CRC has a weak prognosis [4], and only a small number of people are diagnosed in the early stages of this cancer. Therefore, an early diagnosis for proper treatment and malignancy prevention are essential [5]. Many factors, such as genetics, lifestyle, and environment, play an important role in increasing the incidence of CRC [6][7][8]. It has been proven that a change in the diversity of the gut microbiota population or the digestive dysbiosis affects the process of the emergence and progression of malignancy in intestinal cells [1,2,9]. The human body works in symbiosis with microbiota [3,5].
Studies have found that the Escherichia coli (E.coli) load increased in the intestines of people with colon cancer, and the pathogenic mechanisms and type of toxins production differ from the intestinal isolates in healthy people [9,10]. This bacterium is divided into four major phylogenetic groups, A, B1, B2, and D. Most isolates with more severe virulence and pathogenicity factors are in groups B2 and D, respectively [10,11]. Most of the B2 phylogenetic group strains can prevent apoptosis pathway progression in colon cells. Consequently, it can play a role in the high proliferation of these cells and cancer advancement [12,13]. Therefore, group B2 isolates may be considered a real risk factor for the development of colorectal cancer. In research on the mucus and biopsy specimens from people with colon cancer and diverticulosis, scientists found that most of these isolates, especially those containing the target genotoxins, belonged to the B2 phylogenetic group. Also, they concluded that these toxins could play a role in the process of tumorigenesis [14]. According to other documents, EPEC diarrhea-causing E.coli isolates can secrete a protein called ESPF. This protein inhibits the factors involved in the repair of DNA mismatch mutations [15]. Similar studies performed on the role of the structural and secretory proteins of the bacterial isolates in tumor formation have reached similar results to the above studies [13,16].
To achieve a hypothetical microbial biomarker for CRC screening, we compared and analyzed the proteomics of E.coli isolated from two groups of people, one with CRC and the on from the healthy population; the results gave us a microbial protein biomarker called OmpA. Furthermore, given the importance of the role of this bacterium on the apoptosis pathway of the colon cancer cell line, we investigated the effect of this protein on the HCT116 cell line to better understand how to change the expression level of the pathway's essential Bak, Bax, Bcl-2, and P53 genes.

Characteristics of samples
Fecal samples (stool) were collected from 20 people (9 females and 11 males) who found colorectal cancer in their biopsy during a colonoscopy but before any treatment. More sampling was necessary to use statistical analysis to compare isolated E.coli proteins from people who have colorectal cancer with the proteome of E. coli isolated from healthy people. Thus, stool samples of 50 (25 females and 25 males) healthy people were collected. It should be noted that only the E.coli strains were isolated from each sample [17].

The uspA gene amplification
Using uspA gene fragment amplification and sequencing (LC639828) confirmed that all 70 isolates of bacteria from the control and patient groups were E.coli.

Identification of E. coli phylogenetic groups
We identified 20 isolates of gut microbiota E.coli in the CRC patients, and 11 (55%), 7 (35%), 0 (0%), and 2 (10%) of these strains belonged to the B2, D, B1, and A phylogenetic groups, respectively. Of the 50 commensal E. coli isolates from the healthy group, 13 (26%), 12 (24%), 5 (10), and 20 (40%) bacteria were allocated into the B2, D, B1, and A groups, respectively. It is worth noting that the prevalence of E.coli in the B2 phylogenetic group was higher in people with CRC (Pv = 0.02). Moreover, in both the control and patients groups, 3 (6%) and 6 (30%) of E.coli strains in the B2 phylogenetic group were isolated from people who had family members with cancer, and this difference was statistically significant, Pv < 0.05.

Two-dimensional gel electrophoresis
Using the Same Spot software, we analyzed the spot results in the 2DE gels on the pooled samples from the control and CRC groups qualitatively and quantitatively. By merging the 2DE gel images from both groups, the software recognized a distinct spot. This spot in the CRC 2DE gel was much darker and denser than the same spot in the other group (Fig. 1). The figure revealed a weight of about 30 kD and PI ~ 6 for the chosen protein (Fig. 2).

LC-MS outputs analysis
Based on results analysis from the Mascot server (the standard for protein identification using mass spectrometry data), five proteins were found with high sequence homology to the target protein. The datasets used to align the sequences were obtained from the NCBI (www. ncbi. nlm. nih), UniProt, and Expasy (www. expasy. org) web servers. Among them, three proteins with higher scores and the highest similarity rate were most similar to the target protein (Table 1).

Target protein identification by Bioinformatics analysis
After aligning the protein sequences, results showed the similarity between the malate dehydrogenase strain 536 of UPEC to the strain K12 of E.coli protein was 100%. However, the similarity of mitochondrial mdh in human cells (mdh2) with the mdh of the E.coli strain was only 99%. Malate dehydrogenase is an essential enzyme in eukaryotes and prokaryotes, i.e., E.coli. mdh is one of the tricarboxylic acid (TCA) cycle enzymes that protects bacteria against oxidative stress and reactive oxygen species (ROS) [18].
In all cancers, such as CRC, the ROS increases in the epithelial cells [19]. Subsequently, it has been seen that E.coli, as a symbiosis microenvironment around these cells in the gut microbiota, produces malate dehydrogenase. This enzyme has a 99% similarity to mdh2. The mitochondrial mdh or mdh2 in the cancer cells' ROS decreases the expression of the genes involved in apoptosis escape and metastasis [20][21][22]. Following this line of thought, because mdh inhibits cancer cells, the target protein in this study is not malate dehydrogenase; moreover, its Mascot similarity score was lower than other proteins (Table 1).
It is noteworthy that in addition to the high Mascot similarity scores (Table 1), the sequence similarity of the outer membrane protein A of E.coli K12 strain and the same protein from Shigella flexneri and Shigella dysenteriae was 93% and 85%, respectively. OmpA is a protein found in the membrane of most bacteria, such as Enterobacteriaceae, which facilitates the bacterium's adherence to the epithelial cell and has a role in protecting and ensuring the bacteria's survival [23,24].  1 The results of the 2DE gel analyzed by Same Spot software. CRC group gel spots (black) were merged with control group gel spots. The marked spot indicates that green is predominant over pink, and the software quantitatively confirmed this result Fig. 2 The results of 2DE electrophoresis of E.coli B2 phylogenetic bacterium isolated from control and CRC groups. a 2DE gel belongs to E.coli proteins isolated from the CRC group, (b) 2DE gel belongs to E.coli proteins isolated from the control group, and (c) Protein size Marker. The yellow dashed line shows the difference between the two gels. Also, the chosen protein was labeled. The red dashed line shows the PI approximately, and the blue dashed line shows the approximate weight Insilico analysis of the OmpA protein Figure 3 shows the conformational structures of the OmpA protein belong to the E.coli K12 strain. The 2ge4 [25], 1g90 [26], 2jmm [27], 1qip [28] and 1bxw [29] PDB files were obtained from the PDB databank. The 3nb3 PDB file or OmpA protein structure of Shigella flexneriwas obtained [30]. The similarity of these proteins in both strains was shown by merging and comparing their 3D structures in the SWISS-MODEL homology-modeling server (Fig. 3).
The overview results of organic enzyme cofactors from the COFACTOR database confirmed the molecular function (MF) of the Omp A asporin activity (GO:0,015,288), with a 0.99 score. Furthermore, he biology process (BP) of this protein predicted by gene ontology (GO) showed that OmpA is a transporter (c score:1.00) and has a role in the cellular process (c score:0.90) and response to stimulus and stress (c score: 0.80, and 0.79). The protein's role in stress response explains its overexpression in bacteria around cancer cells microenvironment compared to bacteria around healthy intestinal epithelial cells.
The diagram of the LocTree3 analysis (see Fig. 4) illustrates that the OmpA is an extracellular protein (score: 0.99) that can target the proteins of cytoplasm, nucleus, peroxisomes, and mitochondria (score:0.41).
The predicted structure matched the 4erh PDB code using the protein-ligand binding site COACH server [31]. The match was with the crystal structure of the OmpA domain from Salmonella enterica subsp (Fig. 5). According to this modeling, binding sites of the residues of OmpA for binding to the peptide occurred at 165, 166, 201, 202, 204, 205, 206, 210, 213, 217, 264, and 268 positions, and binding sites of the OmpA for binding to the nucleic acid were at 202, 203, 245, 247, 248, 250, 251, 255, 260, 264, and 267 residues. While using the Predict Protein database, the residues in binding sites of OmpA protein to DNA and RNA were at positions 33-100 and 33-66, respectively.

Treatment of the HCT116 Cell Line with the OmpA protein
The effects of the OmpA protein concentration and treatment time on the expression of Bak, Bax, Bcl-2, and P53 genes in the HCT116 cell line were investigated using real-time PCR analysis. Results showed that treating the HCT116 cell line with OmpA protein led to a significant down-regulation in the expression of Bak, Bax, and P53 genes, P < 0.0001. However, a significant difference was not seen in the expression of Bcl-2 in the four treated groups compared to the control group, P > 0.05 (see 6). As can be seen in Fig. 7, the treatment time had a more significant effect on the Bak, Bax, and P53 genes expression than the protein concentration. In Fig. 6, the analysis showed a statistically significant difference in the expression level between the genes Bak, Bax, and p53 and between the control and the HCT116 cell line groups treated with OmpA (Fig. 6). The difference is significant in both groups for the HCT116 cell line treated for 16 and 24 h. The results presented in Fig. 7 indicate that the 16-h treatment of HCT116 cell lines with ompA had more effect on down-regulating the expression of these genes (Fig. 7).

Discussion
CRC ranked third among cancers and correlates with the gut microbiota population [1,32]. The dysbiosis of gut microbiota is an effective agent for tumorigenesis and metastasis in people with this type of cancer [1,2,9]. Recently, it was found that E.coli strains isolated from the gut microbiota of CRC patients play a role in tumorigenesis [33,34]. Then again, most of these strains belong to the B2 phylogenetic group [11,14], which has more severe virulence factors [35,36]. DNA damage and oncogenes activation have been proven to trigger the intrinsic apoptotic pathway. In this situation, the tumor suppressor P53 is induced and subsequently activates pro-apoptotic genes, Bak and Bax. Apoptosis depends on the Bax and Bcl-2 genes expression balance. However, the Bcl-2, as an antiapoptotic gene, overexpresses in solid tumors and increases the cells' proliferation [37]. The overexpression of the Bax or underexpression of the Bcl-2 can indicate cell apoptosis [38].
The role of these genes is effective in the apoptotic pathway. So, we assayed the effect of a protein of E.coli on their expression. We analyzed the proteome of the isolated E.coli strains from CRC patients' stool using 2DE [39,40]. The results identified the outer membrane protein A, or OmpA protein, was significantly overexpressed in the CRC group isolated E.coli. This protein was characterized using the LC-MS, a valid, sensitive, and careful method to identify bacterial proteins [40][41][42]. As shown in Fig. 3, the OmpA of Shigella flexneri has a 93% similarity to OmpA of E.coli strains. The insilico study showed that OmpA has an overall beta-barrel structure composed of an anti-parallel beta-strand. Short turns connect these strands on the membrane's periplasmic part and extended loops on the extracellular part [43]. On the one hand, based on the GO terms, this protein handles the response to stress conditions for the bacterium (GO: 0,050,896 and GO: 0,006,950); and the other hand, in pathogen E.coli strains, this protein is considered a virulence factor [23,36,44]. This proves that OmpA can stimulate the immune system [24]. It is worth noting that those E.coli strains that belonged to the B2 phylogenetic group have the most virulence factors [35,45,46]. So it is reasonable to conclude that because of the ROS stress conditions in these strains, which have a symbiosis with cancer cells, OmpA overexpresses and emerges as a virulence factor. The LocTree3 analysis illustrated that in addition to the extracellular proteins, some eukaryotic cell organelles' proteins, especially in the nucleus and mitochondria, are targeted by OmpA (score: 0.41). One bioinformatic study on proteins of pathogenic E.coli strain by LocTree3 analysis showed many targeting proteins in eukaryotic organelles for these strains [47]. They found that with time any alteration in the folding and function of these proteins or DNA damage could affect the cycle of eukaryotic cells infected by pathogenic E.coli strains. This, in turn, can drive the cells to proliferation and cancer development [15,47].   This protein causes a significant down-regulation of the expression of Bax, Bak, and p53 genes, p < 0.0001. According to the results mentioned in this study and the role of BCL-2 in the apoptosis pathway, it is logical to state that this protein does not interfere with the downregulation of the BCL-2 gene, p > 0.05.

Conclusion
This study found that the overexpression of OmpA in the E.coli strains belonging to the B2 phylogenetic group isolated from the gut microbiota of the patients with CRC can affect the colon cell apoptosis pathway and drive the metastasis phenomenon.  Bax (b), Bcl-2 (c), and P53 (d) genes expression in groups of the HCT116 cell line by normalizing against the corresponding levels of β-actin,**** shows significant differences of the level of gene expression in the treatment groups compared to the control group, with P < 0.0001. ns means the absence of significant differences in the gene expression level in the treatment groups compared to the control group P > 0.05

Samples preparation and culture
Fecal samples collected before any treatment from 20 people referred to Shahid Beheshti University of Medical Sciences affiliated medical centers after being diagnosed with colorectal cancer during a colonoscopy by the pathologis to isolate E.coli strains. Each sample was cultured on Mac Conkey agar and Eosin Methylene Blue agar plates (Merck, Germany) using the standard loop and incubated for 24 h at 37˚C. Statistical analysis was conducted to compare the isolates of E.coli of people with colorectal cancer and the E.coli isolated from healthy people. Thus, similar to the CDC patients, stools were collected, cultured, and E.coli strains were isolated from 50 people without any symptoms or relevance to the disease [17].

DNA extraction and Molecular confirmation of E.coli strains
We first extracted the genomic DNA using the phenolchloroform protocol [48] to identify the cultivated bacteria on Mac Conkey agar and EMB agar plates. Then, we amplified the uspA gene (universal stress proteincoding) as an E.coli-specific gene by the PCR method [49,50] in the reaction conditions presented in Table 2. and Finally, we categorized the bacteria by sanger sequencing.

Identification of the isolated E.coli strains phylogenetic groups
The phylogenetic groups of all isolate E.coli bacteria were investigated with PCR performed for chuA(Outer membrane hemin receptor ChuA), YjaA(stress response protein), and TspE4C2(Tail-specific protease)genes (Table 2)  [35,51,52]. In addition, the EcoR62 strain of E.coli containing the chuA, yjaA, and TspE4C2 genes, as the positive control, was considered in this study [17,51].

Protein preparation and SDS-polyacrylamide gel electrophoresis (SDS-PAGE)
Before performing 2DE, we preferred to compare the two groups' SDS-PAGE pattern of separated proteins in the isolated E.coli. So, after culturing the identified bacterial samples in nutrient broth, the B2 strains were pooled in each group and separated from the culture medium by centrifugation. Then, the bacterial samples were suspended in a lysis buffer (glycerol 10%, Tris (pH 8), PMSF 10 mM, and 1% Triton X-100) (Merck, Germany). After sonication of bacteria, total protein was precipitated using acetone (Merck, Germany). Next,the protein separation was performed by centrifugation (9056 g). Finally, total protein was incubated with protein loading buffer (Tris, Glycerol, SDS. Bromophenol blue, and DTT) (Merck, Germany) at 85c˚ 5 min [53]. A 12% (v/v) SDS-PAGE was used for all samples, and the gel was eventually stained with Coomassie brilliant blue R-250 [54,55].

Protein preparation and two-dimensional gel electrophoresis (2DE)
To prepare the samples, equal volumes of the 24-h culture in nutrient broth medium (Merck, Germany) of identified E.coli samples in the B2 phylogenetic group of both control (13 isolates) and CRC (11 isolates) groups were pooled separately. The bacterial sediments separated from the culture media were suspended in lysis buffer (Tris-HCl, EDTA, urea, 10% glycerol, DTT, and NP4O). The E.coli extracted protein samples concentration was 2.6 µg/mL for both the control and CRC group. The sample volume used in this section for both groups was 10 µL. Seven-centimeter immobilized pH gradient (IPG) strips with a nonlinear range of pH 3-10 (BioRad, USA) were used for isoelectric focusing (IEF). The rehydration step of the IPG strips was overnight. The rehydration buffer volume (7 M urea, 2 M thiourea, 4% CHAPS, 0.2% Bio-Lyte pH 3-10, 50 mM DTT, and a trace amount of bromophenol blue) for both the control and test group was 115 μL. The Protein IEF cell (BioRad) was focused by a linear increase from 0 to 250 V for 20 min, followed by a linear rise to14000 V. Afterward, the IPG strips were equilibrated by a 3 ml buffer (50 mM Tris-HCl pH 8.8, 6 M urea, 20% glycerol, 2% SDS, 0.01% bromophenol blue, and 2% DTT) for 15 min. Then, 3 ml of equilibration buffer devoid of DTT supplemented with 2.5% iodoacetamide was used to alkylate the samples (15 min). For the second dimension of electrophoresis, equilibrated strips were placed on top of the 12% SDS-PAGE gel, and electrophoresis was run in the following conditions ( 90 min and100 V). Finally, the gels were stained with Coomassie brilliant blue R-250 [56,57]. The 2DE gel analysis and the decision to opt for a target protein were assayed by the Same Spot software version 5.1.012(U.K).

Extraction of protein spot from the 2DE gel
First, the target protein spot in the 2DE gel was carefully cut, then crushed with an applicator, and sterile deionized water was added to dissolve the protein. Next, the protein solution was incubated for 1 h at 37° C. Then, the solution was passed through a ClearSpin filter microtube with a porosity of 0.22 µm(CLEARLINE Co., US) and centrifuged at 16,099 g for 3 min. Lastly, the supernatant containing the target protein was separated. The protein solution was sterilized using a 0.45 µm filter [58], and the protein concentration was measured with a nanodrop spectrophotometer (0.60 mg/ml).

Protein identification by liquid chromatography-mass spectrometry (LC-MS)
In the first stage, the alkylation of the protein sample was carried out using DTT; in other words, it was digested by trypsin, and then according to standard protocols, peptides were extracted. The process utilized an automated liquid sampler, quaternary pump, degasser, column compartment, and a diode array detector (observed at 220 nm) controlled by the ChemStation© software (Agilent Technologies) on a Hewlett Packard Series1100 HPLC. Finally, advanced mass spectrometry was performed using a 4800 MALDI TOF/TOFTM analyzer (Applied Biosystems/MDS SCIEX) [59,60]. Result analysis was performed using the Mascot server http:// www. matri xscie nce. com/.

Bioinformatics studies on LC-MS analysis results
The results of peptide sequences by the Mascot server in Table 2 were aligned using UniProt (www. UniPr ot. org/ and blast P (www. ncbi. nlm. nih/) to confirm and identify the target protein. The FASTA format of the protein sequences of bacteria were used for the alignments and comparison.

Bioinformatics studies on E.coli target protein
Based on the NMR assay or X-ray crystallography, the 3D structure of OmpA was defined in the PDB format available in the UniProt database. Therefore, the protein's three-dimensional structure was evaluated in the E. coli K12 strain and Shigella flexneri by aligning the primary structure of the proteins using the SWISS-MODEL server (https:// swiss model. expasy. org/ repos itory/ unipr ot/). Also, we decided to use the COFACTOR server (https:// zhang lab. ccmb. med. umich. edu/ COFAC TOR2/ ) to find the molecular function (MF) and biological process (BP) of E.coli ompA, which are the parts of the gene ontology (GO). We used an advanced support vector machine (SVM) and the LocTree3 server (https:// rostl ab. org/ servi ces/ loctr ee3/ ) to predict the localization of the OmpA protein binding sites in eukaryotic cell organelles [47]. Using the Predict Protein Database (https:// predi ctpro tein. org/), we identified the OmpA surface residues that can bind to DNA and RNA. The COACH server (https:// zhang lab. ccmb. med. umich. edu/ COACH/) predicted two 3D models of the OmpA in the state of binding to a peptide and nucleic acid and determined the binding position of each of the residues of OmpA. COACH is a meta-server approach to protein-ligand binding site prediction. Starting from the given structure of target proteins, COACH will generate complementary ligand binding site predictions using two comparative methods, TM-SITE and S-SITE, which recognize the ligandbinding templates BioLiP protein function database by binding-specific substructure and sequence profile comparisons.

Cell culture
We considered the human colorectal cancer HCT116 cell line (ATCC) as an invitro model to investigate the bacterial protein's effect. One hundred five cells were seeded inside a 6-well plate and cultured in an RPMI 1640 medium containing 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin at 37 °C with 5% CO2 [58,61].

Treatment of HCT116 colon cancer cell line with the target protein
We categorized five groups of cell lines for the present study. Group one was treated with 0.006 mg/mL (10µL) target protein and incubated for 16 h. Group two was treated with 0.012 mg/mL (20µL) target protein and incubated for 16 h. Group three was treated with 0.006 mg/ mL(10µL) target protein and incubated for 24 h. Group four was treated with 0.012 mg/mL (20µL) target protein and incubated for 24 h. The last group was the control group without any treatment.
Before any treatment with the identified protein, RPMI complete mediums were replaced with a FBS-free RPMI culture medium.

RNA extraction, cDNA preparation, and genes expression analysis by Real-time RT-PCR
The RNA of the cell lines in each group was extracted using a kit (All Gene ® -Hybrid-RTM, Korea), and the cDNA templates were synthesized from 1 µg RNA with the oligo dT primer and reverse transcriptase enzyme (Fermentas, USA). Four apoptosis gene expression levels for the treated and control HCT116 cell line (Table 3) were analyzed. The Real-time PCR reaction mixture consisted of a 12 µl total volume solution containing 6 µl SYBR Premix EX Taq (2X) Master Mix and 0.25 µl ROX (50X) (Takara, Japan), 1 µl (10 µmol) forward and reverse It should be noted that β-actin was considered the internal reference gene. The comparative quantification Ct method (∆∆Ct) evaluated the mRNA expression levels [38].